Skip to content

Find and fix a bug#3

Merged
giosilvi merged 4 commits intomainfrom
claude/find-fix-bug-mkcojkgj4w2a4o5s-H8t0E
Jan 19, 2026
Merged

Find and fix a bug#3
giosilvi merged 4 commits intomainfrom
claude/find-fix-bug-mkcojkgj4w2a4o5s-H8t0E

Conversation

@giosilvi
Copy link
Copy Markdown
Owner

No description provided.

Replace open() without context manager with proper with statement
to ensure file handle is closed after reading. This prevents
resource leaks that could accumulate over multiple invocations.

Fixed: groundkg/ner_tag.py:51
Replace all open() calls without context managers with proper with
statements to ensure file handles are closed after use. This prevents
resource leaks that could accumulate over multiple invocations.

Fixed resource leaks in:
- groundkg/re_score.py:53 - json.load(open(...))
- groundkg/re_infer.py:96-97 - two json.load(open(...))
- tools/promote_from_scored.py:5 - json.load(open(...))
- training/train_re_sklearn.py:14 - load_jsonl function
- tools/adjust_thresholds.py:47 - json.load(open(...))
- tools/quality_report.py:31 - json.load(open(...))
web/app.py:
- Fix file handle leak in get_file_info() when counting lines (line 70)
- Fix file handle leak in get_file() when counting total lines (line 194)
Both now properly use context managers to ensure files are closed

Makefile.gk:
- Fix critical resource leak in manifest.jsonl generation (line 147)
- Replace inline file opens with proper read_file() helper function
- Use context manager for output file writing
- Prevents resource exhaustion when processing many corpus files
@giosilvi
Copy link
Copy Markdown
Owner Author

@claude take a look

…nditions

tools/crawl.py:
- Fix fragile locals() check for soup object (line 87)
- Replace 'soup' in locals() with explicit None check
- Initialize soup = None before try block for proper tracking

groundkg/re_score.py:
- Add thread safety to global embedder cache (line 16-27)
- Implement double-checked locking pattern with threading.Lock
- Prevents race condition when multiple threads initialize model
- Ensures only one model instance is created even under concurrent access

web/app.py:
- Fix potential race condition in command_status() (line 122-150)
- Store process reference locally to prevent stale references
- Use .pop() instead of del to handle concurrent deletions safely
- Add timeout to communicate() to prevent indefinite blocking
- Add exception handling for robust error reporting
@giosilvi
Copy link
Copy Markdown
Owner Author

@claude what do you think of this PR?

@giosilvi giosilvi merged commit fe19206 into main Jan 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants